Task-based parallel programming for scalable matrix product algorithms

نویسندگان

چکیده

Task-based programming models have succeeded in gaining the interest of high-performance mathematical software community because they relieve part burden developing and implementing distributed-memory parallel algorithms an efficient portable way.In increasingly larger, more heterogeneous clusters computers, these appear as a way to maintain enhance complex algorithms. However, task-based lack flexibility features that are necessary express elegant compact scalable rely on advanced communication patterns. We show Sequential Task Flow paradigm can be extended write yet routines for linear algebra computations. Although, this work focuses dense General Matrix Multiplication, proposed enable implementation describe resulting GEMM operation. Finally, we present experimental analysis two homogeneous supercomputers showing our approach is competitive up 32,768 CPU cores with state-of-the-art libraries may outperform them some problem dimensions. Although code use GPUs straightforwardly, do not deal case it implies other issues which out scope work.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Parallel Evaluation of Matrix-Based Dynamic Programming Algorithms

Dynamic programming techniques are well established and employed by various practical algorithms, for instance the edit-distance algorithm. These algorithms usually operate in iteration-based fashion where new values are computed from values of the previous iterations, thus they cannot be processed by simple data-parallel approaches. In this paper, we investigate possibilities of employing mult...

متن کامل

Highly Scalable Parallel Algorithms for Sparse Matrix Factorization

In this paper, we describe scalable parallel algorithms for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithms substantially improve the state of the art in parallel direct solution of sparse linear sys...

متن کامل

Fast and Scalable Parallel Algorithms for Matrix Chain Product and Matrix Powers on Reconfigurable Pipelined Optical Buses

Given N matrices A1, A2, ..., AN of size N × N, the matrix chain product problem is to compute A1 × A2 × ...× AN. Given an N × N matrix A, the matrix powers problem is to calculate the first N powers of A, i.e., A, A, A, ..., A. Both problems are important in conducting many matrix manipulations such as computing the characteristic polynomial, determinant, rank, and inverse of a matrix, and in ...

متن کامل

Optimising Parallel Logic Programming Systems for Scalable Machines Optimising Parallel Logic Programming Systems for Scalable Machines

Logic programs are good examples of symbolic applications that often exhibit large amounts of implicit parallelism and that can greatly beneet from parallel computers. Parallel logic programming (PLP) systems have obtained excellent results for traditional bus-based shared-memory architectures. However, the scalable multiprocessors being developed today pose new challenges, such as the high lat...

متن کامل

Abstractions for Portable, Scalable Parallel Programming

ions for Portable, Scalable Parallel Programming Gail A. Alverson William G. Griswold Calvin Lin David Notkin Lawrence Snyder September 24, 1997 Abstract In parallel programming, the need to manage communication, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the cost of c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Mathematical Software

سال: 2023

ISSN: ['0098-3500', '1557-7295']

DOI: https://doi.org/10.1145/3583560